Bioinformatics A Practical Guide to Next Generation Sequencing Data Analysis (Hamid D. Ismail)

282 ◾ Bioinformatics

--i-reference-sequences inputs/97_otus-GG_db.qza \

--p-perc-identity 0.97 \

--o-clustered-table closed_ref_cl_97/table-yoga-closed_cl.qza \

--o-clustered-sequences closed_ref_cl_97/rep-seqs-yoga-close_

cl.qza \

--o-unmatched-sequences closed_ref_cl_97/

unmatched-yoga-close_cl.qza

The above script outputs three artifacts: A feature table, clustered sequences (the sequences

defining the features in the feature table), and unmatched sequences (the sequences that

didn’t match reference sequences at 97% identity). The unmatched sequences will be com-

pletely ignored.

7.3.4.2.1.3.3 Open-Reference Clustering

The open-reference clustering is hybrid of the above two clustering methods. First, it uses

reference sequences for clustering the matched sequences and then it performs de novo

clustering on the unmatched sequences. The open-reference clustering is performed with

“cluster-features-open-reference” method. The input and output artifacts are the same

as that of the closed-reference clustering except that there are no unmatched sequences;

instead, there is an artifact for the new reference sequences used as an input in addition to

the sequences clustered as part of the internal de novo clustering step. We will create the

new subdirectory “open_ref_cl_97” for files of the open-reference clustering.

mkdir open_ref_cl_97

qiime vsearch cluster-features-open-reference \

--i-table inputs/derep-yoga-table.qza \

--i-sequences inputs/derep-yoga-seqs.qza \

--i-reference-sequences inputs/97_otus-GG_db.qza \

--p-perc-identity 0.97 \

--o-clustered-table open_ref_cl_97/table-yoga-open_cl.qza.qza \

--o-clustered-sequences open_ref_cl_97/rep-seqs-yoga-open_cl.qza \

--o-new-reference-sequences open_ref_cl_97/

new-ref-seqs-open_cl.qza

The three clustering methods use dereplicated feature table and representative sequences

and produce a final feature table and OTU representative sequences to be used in the

downstream analysis for phylogeny, diversity analysis, assignment of taxonomic group,

and differential taxonomic analysis.

7.3.4.2.2 Denoising

Like clustering, denoising also produces a feature table and representative sequences.

However, denoising attempts to remove errors and to provide more accurate results.

There are two denoising methods available in QIIME2: DADA2 and deblur. Both meth-

ods output feature tables containing feature abundances and ASVs. Moreover, they also